Stochastic Gradient Descent with Polyak’s Learning Rate
نویسندگان
چکیده
Stochastic gradient descent (SGD) for strongly convex functions converges at the rate $$\mathcal {O}(1/k)$$ . However, achieving good results in practice requires tuning parameters (for example learning rate) of algorithm. In this paper we propose a generalization Polyak step size, used subgradient methods, to stochastic descent. We prove non-asymptotic convergence with constant which can be better than corresponding optimally scheduled SGD. demonstrate that method is effective practice, and on optimization problems training deep neural networks, compare theoretical rate.
منابع مشابه
Learning Rate Adaptation in Stochastic Gradient Descent
The efficient supervised training of artificial neural networks is commonly viewed as the minimization of an error function that depends on the weights of the network. This perspective gives some advantage to the development of effective training algorithms, because the problem of minimizing a function is well known in the field of numerical analysis. Typically, deterministic minimization metho...
متن کاملChapter 2 LEARNING RATE ADAPTATION IN STOCHASTIC GRADIENT DESCENT
The efficient supervised training of artificial neural networks is commonly viewed as the minimization of an error function that depends on the weights of the network. This perspective gives some advantage to the development of effective training algorithms, because the problem of minimizing a function is well known in the field of numerical analysis. Typically, deterministic minimization metho...
متن کاملScaled Gradient Descent Learning Rate
Adaptive behaviour through machine learning is challenging in many real-world applications such as robotics. This is because learning has to be rapid enough to be performed in real time and to avoid damage to the robot. Models using linear function approximation are interesting in such tasks because they offer rapid learning and have small memory and processing requirements. Adalines are a simp...
متن کاملOnline Learning, Stability, and Stochastic Gradient Descent
In batch learning, stability together with existence and uniqueness of the solution corresponds to well-posedness of Empirical Risk Minimization (ERM) methods; recently, it was proved that CVloo stability is necessary and sufficient for generalization and consistency of ERM ([9]). In this note, we introduce CVon stability, which plays a similar role in online learning. We show that stochastic g...
متن کاملStochastic Gradient Descent with GPGPU
We show how to optimize a Support Vector Machine and a predictor for Collaborative Filtering with Stochastic Gradient Descent on the GPU, achieving 1.66 to 6-times accelerations compared to a CPUbased implementation. The reference implementations are the Support Vector Machine by Bottou and the BRISMF predictor from the Netflix Prices winning team. Our main idea is to create a hash function of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Scientific Computing
سال: 2021
ISSN: ['1573-7691', '0885-7474']
DOI: https://doi.org/10.1007/s10915-021-01628-3